auto_explore Machine learning practitioners need first to identify signal in their datasets before building models.The primary goal of auto-explore is to to establish a codebase that reduces the effort to produce a reasonable first-pass exploratory data analysis for a variety of dataset types.
This Python library is a first attempt at automating the process of exploratory data analysis – at least as far as computation and visualization is concerned.
Critical thinking is not included.
In [10]: from auto_explore.datetime import make_calendars
In [11]: year_list = np.arange(2011, 2020)
In [12]: cal_df = make_calendars(year_list, drop_index=False)
In [13]: cal_df.head()
Out[13]:
month year weekday is_weekday is_holiday is_holiday_week
DATE
2011-01-01 January 2011 Saturday 0 0 0
2011-01-02 January 2011 Sunday 0 0 0
2011-01-03 January 2011 Monday 1 0 0
2011-01-04 January 2011 Tuesday 1 0 0
2011-01-05 January 2011 Wednesday 1 0 0In [10]: from auto_explore.apis import fetch_fred_data
In [11]: series_list = ['SP500', 'NASDAQCOM', 'DJIA', 'RU2000PR'] # cboe energy sector etf volatility
In [12]: econ_df = fetch_fred_data(series_list)
In [13]: econ_df.head()
Out[13]:
SP500 NASDAQCOM DJIA RU2000PR
DATE
2011-01-03 1271.87 2691.52 11670.75 1984.61
2011-01-04 1270.20 2681.25 11691.18 1952.99
2011-01-05 1276.56 2702.20 11722.89 1976.01
2011-01-06 1273.85 2709.89 11697.31 1966.88
2011-01-07 1271.50 2703.17 11674.76 1957.96 0AutopilotExploratoryAnalysis object will make many methods available on your data with minimal set up.auto_explore.eda: Interface to Semi-AutomationSimply specify a DataFrame and a list for each of its binary, categorical, numerical and text columns. If applicable, set the target_col as a list with one element (string).
from auto_explore.eda import AutopilotExploratoryAnalysis
args = (df, bin_cols, cat_cols, num_cols, text_cols)
kwargs = dict(target_col=target_col)
ax = AutopilotExploratoryAnalysis(*args, **kwargs)This object makes available 17 methods for use on the df supplied as an arg. Check out the most recent code on Github.
auto_explore.viz - Contains all the visualization functions useful in EDAauto_explore.featexp - Copy as of April 2019 of the featexp main package code with custom changesauto_explore.apis - Code that fetches data and machine learning models from various sourcesauto_explore.notebooks - Formatting code for inside a Jupyter Notebook REPL environmentauto_explore.stats - Currently only houses best_theoretical_distribution but this will expandauto_explore.datetime - Houses code pertaining to time-series features (e.g. make_calendars)auto_explore.diligence - Houses code that performs sanity checks of various sortsfull_suite_report mechanism.